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ABSTRACT 

A discussion of the nature of speech is presented, followed by a 
review of speech processing to date, with emphasis on the characteris- 
tics of speech which must be retained for intelligibility. Methods of 
measuring speech intelligibility are described. The relative merits of 
abrupt and gradual audio clipping of speech are investigated, and two 
tone and articulation test results are presented showing that there is 
no significant difference in these methods of clipping with respect to 
speech intelligibility. Process MONDE speech to radio frequencies, 
filtering and retranslation to audio to improve the peak to average 
value ratio of the audio frequency prior to transmitting it through a 
noisy channel is investigated. Two tone and articulation test results 
are presented showingsthat this processing results in a 20% improvement 


in speech intelligibility over audio clipping and filtering. 
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1. "Inbroduetion. 

In spite of all his attempts to sophisticate his systems of communi- 
cations, man has yet to devide a more effective means than ordinary 
speech. While the redundancy and lack of logic of some agpects of speech 
is obvious, there is no other means available to us that so effectively 
performs the mission of a communications system, which is to transfer 
thoughts or ideas from one human brain to another. No other method of 
communication can so precisely indicate the exact meanings that the 
individual "transmitting" desires the individual "receiving" to under- 
stand. Speech is limited, of course, by language, vocabulary, and so 
on. 

When it is desired, however, to transmit thoughts, or to communi- 
cate, over a distance of more than a few feet, we discover that speech 
has further limitations or drawbacks. When we attempt to use speech in 
an electronics communications system that is peak-power-limited, and to 
transmit this speech in a noisy environment, we find that these draw- 
backs can be serious impediments to effective communications. Hence, 
for nearly forty years (25) men have been studying ways in which to 
process speech to aid in achieving better communications. The main idea 
has been to process speech in certain ways to remove its disadvantages 
as a comm#hicat ions means, while retaining as much of its ability to 
convey meaning to the listeners as possible. The measure of the success 
of a speech processing system has been the degree by which intelligibil- 
ity is improved, for a given set of conditions, over unprocessed speech. 
Generally there has not been too much concern, through the years, over 
obtaining high quality speech reproduction for communications purposes, 


but only over obtaining high intelligibility. 


In the succeeding sections there will be given a brief description 
of the nature of speech and a review of what types of things have been 
done in speech processing to date, and with what results. Then there 
will be a short discussion of methods of determining speech intelligi- 
bility, followed by a description of, and comments on the value of two 
new ideas in speech processing. These ideas consist of the following: 
First, it might be possible to reduce the distortion introduced by audio 
speech clipping, which, as we will see, is a common method of speech 
processing, by choosing a clipper with a gradual input-output character- 
istic, rather than the normal one wherein élipping occurs abruptly at 
some particular level. Second, it should be possible to improve the 
intelligibility of an audio signal by translating it to radio frequen- 
cies, (that is generate a single-sideband wave) then clip it, filter and 
translate it back to the audio range again. The results of intelligibil- 
ity tests on these systems will be presented and discussed in the hope 
of providing further understanding of speech and speech processing. 

2. The Nature of Speech. 

Speech can be compared to a modulated carrier signal (5), the nature 
of which varies quite a bit with time. For the vowels of voiced sounds, 
the carrier consists of tones generated by the vocal cords, while for the 
consonants or unvoiced sounds the carrier is like broadband noise (18). 
The modulation consists of: 

(a) Turning on and off the carrier. 

(b) Frequency modulation by emphasis, inflection and so on. 

(c) Modification of the harmonic content of the carrier. 

(d) Amplitude modulation. 


As with any other waveform, speech may be represented in the 


frequency domain or the time domain. In the frequency domain we see 
that for vowels, intensities are concentrated in one or more distinct 
frequency regions, called formant regions. Each vowel sound has its own 
set of characteristic formant regions, although these are not necessarily 
the same when the sound is uttered by different people. The consonants 
have components in the frequency domain that generally lie higher than 
those of the vowels and are of lower intensity. Here the intensities tend 


to be scattered continuously over the spectrum, hence the noise-like 


description for the carrier of a consonant as 
distribution of the intensities in consonants 
they are not produced by the vibration of the 
The average intensity spectrum of speech 
Here we see a sharp drop after about 600 Hz. 
typically below 3000 Hz. for adult speech and 


found (21). 


given above (10). This 
is caused by the fact that 
vocal cords, as are vowels. 
is shown in fig. 1 (10). 


The formant regions are 


for vowels three are usually 


Figure 2 shows the formant regions for the ee sound in "pro- 


ceedings" where a fourth formant at 4000 Hz is present (21). 
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The formant regions occur at harmonics of the fundamental frequency 


of the voice which ranges from about 90 Hz. for a deep-voiced man to 300 


Hz. for a high-voiced woman (8). 

As we will see in our discussion of speech processing, a great deal 
can be done to speech that will still yield intelligibility. For some 
time the search has been on to discover what elements in speech remain 
invariant under these sometimes radical alterations that still result Ln 
intelligibility. This search has narrowed down to the frequency spectrum. 
Agreement has more or less been reached that if the formant regions are 
not severly altered the intelligibility of the speech will not suffer un- 
duly. The most striking example of this is the formant vocoder. This 
device locates and measures the energy in the formant regions. This 
information can be coded, transmitted, and intelligible speech reproduced 
at the receiver (21). In 1959 here at the U.S. Naval Postgraduate School, 
S.R. Wilde devised a scheme for speech synthesis using the formant re- 
gions that resulted in intelligible speech using only 140 Hz. of band- 
width. 

In these vocoders we See that the only information used in the 
original wave is that contained in the power spectrum. It has been shown 
that the information contained in the spectrum, the autocorrelation 
function, and the average number of zero crassings of the time domain 
waveform are all three equivalent, and that the formant movements can be 
approximated by the running averages of the number of zero crossings of 
the original and differentiated waves (2). 

3. Speech Processing, General. 

The subject of speech processing is generally concerned with answer- 
ing the following question: What characteristics of speech are undesirable, 
and what can be done to eliminate them, while not altering the power 


spectrum of the wave a great deal? In a peak power limited system we are 
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interested in a signal with a low peak to average value ratiow With 
such a signal we can achieve the best average signal to average noise 
ratio when we attempt to transmit our signal through a noisy environ- 
ment. The normal peak to average ratio of speech, however, is 14.5 db 
(18). This is an undesirable feature of speech which we would like to 
eliminate. Also, as we have seen, speech covers a bandwidth of around 
5000 Hz. Obviously, it would be nice to reduce this if possible. The 
following two sections will discuss the efforts that have been put forth 
to accomplish these two objectives while still retaining intelligibility. 
4. Audio Speech Processing. 

The first step in the effort to reduce the peak to average ratio of 
speech was to clip the peaks of the speech wave. In 1986 J.R. Licklider 
found that for such a system as we have described maximum intelligibility 
is achieved by clipping the peaks of the speech wave and using the avail- 
able power for the rest of the wave. He also attempted center clipping 
wherein the center portions of the wave is removed and only the peaks are 
passed. This, however, resulted in very poor intelligibility beyond a 
few db of clipping (12). 

The big difference in these two types of clipping is that peak clip- 
ping does not alter the zero crossing characteristics of the time wave 
form while center clipping does. This can be seen in fig. 3. Thus, as 
we have seen, center clipping alters one of the invariants and we would 
expect intelligibility to suffer. Licklider also performed various de- 
grees of linear rectification on speech signals and found that articula- 
tion began to suffer just as half-wave rectification was reached or just 
at the point where the zero crossings began to be altered. Figures 4 


and 5 show the results Licklider obtained using articulation tests as 
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the measure of intelligibility. 

In 1948, Licklider, together with I. Pollack, applied himself to a 
further study of the effects of various types of processing on speech 
intelligibility (13). They investigated the effects of integrating, 
differentiating, and clipping of the wave form on speech intelligibility 
without noise. Figure 6 illustrates the effects of various combinations 
of these steps on a sine wave and a speech wave, as far as appearance in 
the time domain is concerned. This study discovered the following: 

(a) Differentiation and integration alone do not effect intelligi- 
bility to a significant degree. 

(b) Infinite (very hard) clipping alone causes a decrease of 
intelligibility of about ten percent below (a). 

(c) Infinite clipping preceeded by differentiation caused no signi- 
ficant decrease in intelligibility. 

(d) Infinite clipping preceeded by differentiation followed by 
integration yielded the same results as (c). 

(e) Infinite clipping followed by differentiation had no effect on 
intelligibility other than that caused by the clipping alone, but the 
quality of the resulting speech was worse. 

(£) Infinite clipping followed by integration caused no further 
degradation of intelligibility over clipping alone, but the quality of 
the speech was improved. 

(g) Infinite clipping preceeded by integration resulted in very poor 
intelligibility, with scores 70% below those of (5). 

(h) Infinite clipping preceeded by integration followed by differ- 
entiation resulted in even poorer scores, 80% below those of (a). 


The integrator and differentiator used in these tests are shown in 
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figures 7 and 8. Differentiation serves to 'tilt' the spectrum up. It 
introduces six db less attenuation for each octave increase in frequency. 
Integration has the opposite effect, tending to tilt the spectrum down- 
ward six db per octave. Looking again at fig. 1, we see that the inten- 
sity of the high frequencies in speech is much less than that of the low 
frequencies in natural speech. When we differentiate then clip we are 
emphasizing the highs before clipping. Thus in the clipped wave, the 
highs, which tarry much of the intelligibility, are less likely to be 
masked by noise. We are of course changing the quality of the speech 

in doing this. When we integrate before clipping, we do the gpposite and 
the highs can be completely lost. Since clipping alone tends to bring 
the lows down closer to the highs in intensity, clipping followed by 
integration will result in more natural sounding speech. On the other 
hand clipping followed by differentiation will result in worse speech 
quality since the normal ratios of intensities is further changed. 

Thus we can say that finite clipping preceeded by differentiation 
can be used to reduce speech to a bivariate code and integration can be 
used to retrieve natural speech. However it has been found that differ- 
entiation before clipping raises the peak to average ratio of the wave by 
4 db to 18.5 db (18). Thus we would have to clip harder and amplify more 
after clipping. Since we are interested mainly in intelligibility, it is 
doubtful whether this differentiation is worth it. We can See that in- 
tegration before clipping is just the opposite of what we want to do with 
a speech wave. 

So far we have discussed clipping only with reference to a fixed 
signal to noise ratio, or with reference to no noise at all. Pollack 


discovered that infinite peak clipping improved intelligibility for a 
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given signal to noise ratio until high signal to noise ratios were 
reached (19). This decrease in the benefits of clipping is expected 
since, as we have seen, infinite clipping does reduce intelligibility 

by aboutęten percent with no noise present. This can be explained by 
considering the distortion introduced by clipping as noise. Then, be- 
yond a certain level of actual noise, the noise introduced by clipping 
will outweigh the benefits gained by clipping (18). In later studies (20) 
Pollack investigated the effect of clipping on speech further and found 
that clipping was definitely beneficial at poor signal to noise ratios. 
For a five db signal to noise ratio he determined that when the peak of 
the speech wave was clipped 24 db, in order to achieve the same intelligi- 
bility the gain had to be increased to 13 db, resulting in an improvement 
of 11 db. 

As has been pointed out previously, it would also be nice if the 
bandwidth of speech could be reduced. Investigations have been carried 
out to determine the effects of limiting the frequencies of the speech 
wave form. Among these were those carried out by Egan and Wiener at the 
Harvard Psycho-Acoustical Laboratories. These results show that intelli- 
gibility scores vary only about eight percent below the full bandwidth 
case when the speech frequencies are limited to 340 and 3900 Hz. As long 
as the pass band for speech is in this range intelligibility does not 
suffer. The important thing is that most of the formant regions must be 
included in the pass band (7). Figure 9 shows the effect of filtering 
on the intelligibility of speech. 

It has been determined that if speech is limited to a given band of 
frequencies, the intelligibility of a clipped relative to an unclipped 


signal is a function of the signal to noise ratio alone (19). We have 
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seen how clipping alone introduces a decrease in intelligibility at high 
signal to noise ratios. This effect is shown to decrease if the lower 
frequencies of the speech are removed prior to clipping (19). The lowér 
frequencies contain nearly all voice fundamentals. The formant regions, 
however, are at harmonics of the voice fundamentals. The clipping pro- 
cess, as we will see in section 8, introduces harmonics of the frequencies 
contained in the original wave. Thus if the lower frequencies are present 
when a speech wave is clipped, the harmonics generated by the clipping 
process lie right where the formant regions should be and thus alter them. 
Also, as we shall see in section 8, the clipping process introduces inter- 
modulation products among the frequencies present in the original wave. 
These products will also lie in or near the formant regions if the low 
frequencies are present in the unclipped wave. Thus we can see that the 
"noise' generated by clipping can be reduced by removing frequencies be- 
low the highest expected voice fundamental, about 300 Hg» We cannot com- 
pletely eliminate frequency distortion caused by clipping. Harmonics 

and intermodulation products from all frequencies present in the wave to 


be clipped will appear as undesired frequency components in the clipped 
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wave. The thought occurs that perhaps some particular type of clipper 
can be found that will reduce these undesired components. Section 7 is 
devoted to a presentation of an idea along these lines and to showing 
intelligibility test results comparing two divergent types of clippers. 
5. R-F Speech Processing. 

So far in the discussion of speech processing we have only been con- 
sidering operations on the speech wave at audio frequencies. In communi- 
cations systems, however, we usually intend to translate our intelligence 
to radio frequencies before transmitting it through any appreciable noise. 
Focusing our attention on radio frequency processing we see that the 
Single sideband system of modulation lends itself very well to a study of 
such processing. Here we have an opportunity to study the effects of 
clipping at three places in the system; at the audio frequencies, at r-f, 
but with the double sideband signal, and at r-f with the single sideband 
signal. In fact, an extensive study at the Montana State College in 1962 
did just that (27). In this project clipping of various degrees was per- 
formed at each point in a single sideband system; at audio, double side- 
band, and single sideband, with appropriate post-clipping filtering to 
regain bandwidth. The processed signals were mixed with varying degrees 
of noise and signal intelligibility of the wave after detection was 
measured. In addition, combinations of clipping at all three places 
were tested, as well as various methods of achieving high clipping levels, 
such as clipping one-half the desired amount, filtering, and then clip- 
ping the other half. 

The results of this study show that single sideband clipping yields 
Significantly higher intelligibility scores than do audio or double side- 


band clipping, or any combination of the three. When clipping at single 
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sideband is done, the frequencies being clipped are the same ones as in 
the original audio wave, but after they have been translated to radio 
frequencies. Now the formant regions, for instance, no longer bear har- 
monic relationships to each other. When we clip at r-f, the harmonics 
and intermodulation products are "splattered" over a much wider frequency 
range, so it is possible to filter out all but those occurring immediate- 
ly around the carrier frequency. Thus, when the wave is demondulated we 
have many fewer undesired components present. 

In double sideband clipping we have twice as many frequencies pre- 
sent in the wave to be clipped and so end up with many more undesired 
components too close to the carrier to filter out without removing our 
intelligence. 

In single sideband clipping we do have a repeaking problem as a re- 
sult of the filtering. In the Montana study this was observed to reach 
four db for very hard clipping. However, this is still a considerable 
saving over the original 17.5 db peak to average ratio of unclipped single 
sideband speech (18). 

If a speech wave is infinitely clipped at the audio level and is 
used to modulate a single sideband wave with an r-f pass band of f + 300 
to £ + 3000, the peak to average ratio of the resulting single sideband 
signal is about 7.3 db (18). Thus, not only do we have more distortion 
present with audio clipping, but we do not achieve as low a peak to 
average power ratio as with single sideband clipping. 

The effects are so well recognized now that the Collins Radio Com- 
pany, in their single sideband manual categorically state that speech 
clipping at audio frequencies "is of no practical value in a single 


Sideband transmitter" (1). 
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Returning to the Montana study for a moment, this group points out 
that iterative clipping, that is clip, filter, clip again, has no advan- 
tage over single sideband clipping in one stage followed by filtering to 
regain band width. [In addition various combinations of differentiation, 
integration and clipping were investigated with no significant result (27). 

The Voice of America radio has used single sideband clipping to 
achieve a 9 db improvement in signal to noise ratio in combating jamming 
(11). Single sideband clipping has been applied to amateur radio also 
with excellent results (24). 

The above discussion of speech processing at radio frequencies was 
with reference to a system wherein the noise is introduced at the radio 
Frequencies. That is a system which is concerned with transmitting a 
radio frequency wave through a noisy channel. But consider a peak power 
limited system where the noise is introduced at the audio frequencies, 
such as a public address system or the "one MC" and "21 MC" systems a- 
board U.S. Navy ships. We have seen that it would be advantageous to 
perform clipping on the audio wave to improve intelligibility. But might 
it not be feasible to introduce a device ifito the system in which the 
Signal is translated to a radio frequency, clipped, filtered, then trans- 
lated back down to the audio frequencies? Should not this process result 
in even greater intelligibility due to the removal of distortion caused 
by clipping in the filtering of the clipped wave? This idea will be dis- 
cussed and investigated in section 9. 

6. Intelligibility Measure: The Articulation Test. 

We have seen how various types of speech processing used in the past 

effect speech intelligibility, and we have mentioned two additional ideas 


that we will discuss further on. But no discussion has been made about 
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how Speech intelligibility is measured. 

The most commonly accepted method of testing the intelligibility 
of a speech processing system is the articulation test. First developed 
by the Bell Telephone Co., (9) these consist of trained listeners listen- 
ing to a selected list of sounds, words, or sentences and recording what 
they hear. The results are compared with the lists actually transmitted 
through the system under test and a mean articulation score is computed. 
This is compared against known scores achieved using other systems to 
determine the relative merits of the system under test with respect to 
intelligible transmission or reproduction of speech. 

There are many ways to conduct articulation tests. The test re- 
sults shown in the next two sections were obtained using the methods 
described by the Harvard Pyscho-Acoustical Laboratory study, "Articula- 
tion Testing Methods II" (16). In these tests phonetically balanced 
word lists were used. These are lists in which speech sounds occur with 
approximately the same frequency as they occur in the English language, 
and the words are so chosen that there are no very easy or very difficult 
words in each list. That is, all the words are of uniform, intermediate 
difficulty. This eliminates "dead wood" words which would always be missed 
or always be heard correctly and thus give no information on intelligibil- 
ity. 

Word lists rather than sentence lists or sound lists were used for 
the following reasons: Sound lists require a very careful "talker" and 
very well trained listeners. Neither were readily available. Sentence 
lists are easier than word or sound lists in this respect, but the time 
needed to give and grade tests composed of sentence lists was considered 
excessive. Twenty phonetically balanced word lists were used. The order 
of the words on each list was randomized with the aid of a table of ran- 
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dom numbers and the order of some of these lists were reversed to give 
additional lists. Care was taken to ensure that the listeners did not 
hear a list and its inverse version within too short a time, and when it 
was necessary to use a list for the second or third time care was also 
taken to make sure a sufficient amount of time had elapsed so that the 
listeners were not able to recognize the order of the words. A total of 
32 lists were generated. Samples of these are given in Appendix II. The 
Harvard study contains all twenty of the original lists, with the words 
in alphabetical order. 

For each word list the peak list word was determined. This is the 
word which resulted in the highest amplitude for each list. This word 
was used to determine the peak signal for each list in order to set the 
Se level C and the signal to noise ÁS defined below. A list 
of the peak list words and their relative amplitudes is contained in 
Appendix II. 

These word lists were initially recorded with a signal to noise 
ratio of 45 db on a Berlant Concertone tape recorder. The microphone 
used was an Altec 6604 dynamic. A peak reading meter on the recorder and 
a Tektronix 515A oscilloscope was used to keep the recording voice at a 
constant level. 

Both series of tests described in sections 8 and 9 involve clipping 
and signal to noise ratios. Since we are concerned with random noise and 
peak power limited systems these parameters were defined as: 


A 


C 


signal to noise ratio = 2010619E:/E, 


clipping level = 20106, 9E,/E, 
where 


E} = peak signal at point where noise is introduced 
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E 


s = peak signal after clipping 


E, = r.m.s. noise voltage 
The noise voltage was generated by a General Radio Company type 
1390-B random noise generator. E, was measured by a calibrated meter on 
the face of the generator which was connected directly across its output. 


E Es > and E, were measured with an oscilloscope, using the peak list 


e? 
words. 

Each test consisted of two of the phonetically balanced words lists 
of fifty words each. Each word was given as the last word of a carrier 
sentence. The carrier sentence used was "The word you should write is 

.'' Only the word under test, always the last word in the sen- 
tence, was recorded by the listener. The carrier sentence was used for 
two reasons (16). First, the listener is prepared for the test word and 
the missing of words due to inattention is reduced. Second, the carrier 
sentence helps to keep the voice level even while recording the lists. 

A space of three to four seconds between carrier sentences was found to 
be adequate. As recommended in the Harvard study (31), six listeners 
were used. In the tests described in section eight these were U.S. Navy 
enlisted men, all of about 22 years of age. The minimum educational back- 
ground of this group was three years of college training. Unfortunately 
this group was not available for the test described in section nine. In 
these tests 5 listeners were used, three of whom were U.S. military offi- 
cers and college graduates, one of whom was a U.S. Navy enlisted man with 
some college training and one of whom was a U.S. Navy enlisted man with a 
high school education. No significant differences in the scores of these 
listeners were noted. 


To avoid fatigue the testing procedures were as follows: The tests 
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were grouped into sessions of three tests each, each test being of about 
14 minutes duration. Between each test the listeners were given about 
one hinute to adjust headsets, chairs and so on. After each session, 
which lasted around 45 minutes, a 15 minute break was given. No more 
than three consecutive sessions were held before stopping for lunch or 
quitting for the day. 

The listening facility was in a small quiet room. Bach listening 
position was numbered and consisted of a chair, a writing space, a volume 
control and a headset. The headsets were standard 300 ohm communications 
headsets used by the Navy. To each was added foam earpads to add comfort 
and to help shield noise. 

The listeners recorded what they heard on forms like that shown in 
Appendix II. In order to ensure that the positions did not effect the 
scores, the average rank of the scores made at each position was calcu- 
lated. Similarly to check for significant differences in the listeners, 
the average rank of each listener's scores was also determined. These 
two figures were made independent by having the listeners shift positions 
after each test, thus ensuring that no listener stayed at one position 
too long. These results, shown in Appendix II, were such that there was 
no substantial difference in listeners or positions. 

All listeners scores are given in Appendix II for each series of 
tests. Further details on each series of tests may be found in section 
eight or nine and in Appendix Il. 

7. Other Intelligibility Measures. 

While the articulation test is the most widely accepted method of 

determining intelligibility, as well as the most obvious, work has been 


done on other methods as well. These methods are based generally on the 
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idea that intelligibility is a function of how well the running power 
spectrum of the wave is preserved by the system under test. In one case 
(22), equipment was built and tested which compared the running power 
spectrum of the speech before and after processing and calculated an in- 
telligibility index. This index seemed to compare favorably with articu- 
lation test scores. In another case (26), devices were designed to 
measure the average number of zero crossings of the speech wave. From 
this information an index of intelligibility was calculated. 

Neither of these two methods seems to have found general acceptance. 
Hence for this project the more conventional articulation test was used. 
8. Gradual and Abrupt Clipping. 

As we have seen, speech clipping at audio frequencies can be used 
as a means to increase the peak to average ratio of speech waveforms in 
peak power limited systems. We have seen how Such clipping can be very 
beneficial in systems where intelligibility in the presence of noise is 
of paramount importance, while the quality of the speech heard by the 
listener is of secondary importance. 

Usually one thinks of a clipper as a device having the characteris- 
tics shown in figure 10. Here the output ex ÍS a faithful reproduction 
of the input e; up to the point where e; = C. After this point 6. > U 
no matter how large eż becomes. This will be referred to as an abrupt 
clipper, where C is the clipping level. 

One can, however, perform clipping with a device with a characteris- 
tic such as that shown in fig. 11. Here clipping begins almost as soon 
as e; becomes greater than zero and e, reaches some "saturation" point 


C, beyond which it remains constant no matter how big e; becomes. This 


will be called a gradual clipper. 
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Figure 10. Abrupt Clipper Figure 11. Gradual Clipper 


It is the purpose of this section to investigate the relative merits 
of the abrupt and gradual clipper as applied to speech. The criteria 
used will be the intelligibility of the clipped wave in the presence of 
various degrees of noise with various degrees of clipping. 

This investigation was prompted by a remark in an article by Middle- 
ton to the effect that gradual clipping has less effect on the spectrum 
of Gaussian noise than abrupt clipping (15). Davenport has determined 
experimentally that the probability distribution for the noise-like un- 
voiced sounds is approximately Gaussian (4), so it would seem that grad- 
ual clipping would have some advantage over abrupt clipping. 

First it was decided to determine the amount of intermodulation dis- 
tortion introduced by each type of clipper. In order to do this tests 
were made on a clipped two tone signal. Tones of 1500 and 2500 Hz. of 
equal amplitude were combined and clipped by each type of clipper at 
various clipping levels. The intermodulation components present in the 
clipped wave were then measured with a wave (spectrum) analyzer. 


The gradual clipper consisted of two 1N34A germanium point contact 
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diodes, arranged back-to-back and unbiased. The clipping characteristics 
of this device is shown in fig. 12. For the abrupt clipper the same 
diodes were used, each reverse biased by one volt. The characteristic 

of this clipper is shown in fig. 13. 
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Figure 12. Clipping characteristic, 1N34A, no bias 
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Figure 13. Clipping characteristic, 1N34A, one volt bias 
Appendix I shows the equipment setup used in these tests together 
with a description of the instruments used. 
Since the clipper characteristics are odd functions, they can be 


approximated by an infinite series containing only odd terms, such as: 


€o = kie; + kzej + kee? b. okna 
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Considering only the first five terms of such a series we see that for an 
input of the form: 


e; 


nim AcosWıt + BeosWot 


the output will contain the following frequencies (18): 


Wy, Na, As 3Wo, 2W] Í W2, W1 2 2Wo, 5Wy, 5W,, 4Wy 7 Wa, 
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3W, É 2W3, 2Wy Í Na, Nu Z HW 
For the 1500 and 2500 Hz. tones used these frequencies are: 
1500, 2500, 4500, 7500, 5500, 500, 6500, 3500, 7500, 
12,500, 8500, 3500, 9500, 500, 10,500, 4500, 11,500, 
8500, ə» » ə (all Hz.) 
Table I shows the relative amplitudes of these frequency components in 
db down from the fundamentals when the two tone signal was clipped with 
the indicated type of clipper. In addition the db difference between the 
two clippers (gradual minus abrupt) of each component is shown. We as- 
sume that we want to retain the two tones in the original signal and that 
everything else is clipping "noise" which we desire to minimize. 

It appears that from the standpoint of intermodulation distortion 
there is very little difference between the two types of clippers. 

Next it was desired to see if either clipper introduced a signifi- 
cantly larger harmonic content when clipping a single tone. A tone of 
200 Hz. was chosen to simulate a sound in the range of speech frequencies. 
Table II shows the results of clipping this tone with each type of clip- 
per. 

Here we see that the abrupt clipper does introduce slightly higher 
harmonic components, especially at the higher frequencies. The differ- 
ence between the two clippers is small until the higher harmonics are 


reached. These harmonics, however, are so small that they probably 
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Clipping 


level 
32 9 UD 6.8 db 
Gradual Abrupt Gradual Gradual Abrupt Gradual 
Freq. clipper Clipper - abrupt clipper clipper - abrupt 
500 2188 19.0 2.5 17.2 135 337 
3500 2270 20.8 ee DE 14.8 22 
4500 38.0 28.0 10.0 28.0 27.0 TO 
5500 220 21792 0.8 18.0 14.0 4.0 
6500 2220 21.1 o” 17.0 14.8 232 
7500 36.0 28.9 mL 39.5 29.0 10m5 
8500 46.0 54.2 -8.2 40.3 32.0 Soo 
9500 38.8 42.2 -3.4 2065 30.0 -1.5 
17. 706 ‘ 

500 13.1 10.5 2.6 

3500 13.4 10.0 3.4 
4500 20.3 16.0 4,5 

5500 1352 1528 -1.8 

6500 19356 10.0 3.6 

7500 3122 29.8 1.4 

8500 36.1 35.4 067 

9500 20.5 2128 -1.3 

Table I 


Distortion components from two tone tests, in db dom from 
fundamental. 


29 


Clipping 


level 6.0 db 12.0 db 
Gradual Abrupt Gradual Gradual Abrupt Gradual 
Freq. Harmonic clipper clipper - abrupt clipper clipper - abrupt 
2200 lst 0 0 0 0 0 0 
600 3rd 18.0 13.4 4.6 14.0 11.0 3.0 
1000 Sch 2932 270 2,2 Zee 18.4 SEN 
1400 7th 40.2 42.2 -2.2 22 2228 4.4 
1800 9th 5122 3052 15.0 52-1 2822 379 
2200 rith 72-0 41.2 30.8 46.8 356 112 
2600 13th 72.0 56.2 1578 41.0 38.6 2.4 
3000 isch * 18.9 = 45.1 39.1 6.0 
3400 17th * 50.0 - 49.2 44.2 520 
24.0 db 
200 lst O 0 0 
*Too small to 

600 3rd ia 10,2 2.0 measure 

1000 5th 18.0 15.0 3.0 

1400 7th 2123 182 l 

1800 9th 24.8 2028 4.0 

2200 11th DL 228 4.3 

2600 13th 273] 24.8 4.3 

3000 15th 31.0 2622 4.8 

3400 17th 34.0 2/82 6.1 

Table II 


Single tone clipping results, in db below fundamental 
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couldn't be detected by the ear. It remains to be seen whether these 
small differences in intermodulation distortion and harmonic distortion 
are sufficient to cause a difference in intelligibility, especially if 
the clipped signal is band limited. 

In order to determine whether either clipper results in increased 
intelligibility it was decided to conduct articulation tests as des- 
cribed in section six. The word lists were played into the clippers at 
at the levels necessary to obtain the desired clipping levels. The 
clipped signal was filtered with a pass band of 300 to 3000 Hz. Then 
noise from the noise generator filtered to the same bandwidth as the 
speech was introduced at a level corresponding to the desired À as 
defined in section six. The clipping levels (C in section six) chosen 
were 0, 12 db, 24 db, and 33 db. the A's AŻ 3 db, 6 db, 12 db, and 
18 db. The resulting signal was recorded on tape and was played to the 
listeners later. 

Figures 14 (A), (B), (C), and (D) show the results of the articula- 
tion tests using the 1N34A's unbiased as the gradual clipper and the 
IN34A's with a 1 volt bias as the abrupt clipper. Without the benefit 
of statistical analysis, one could say that there is very little differ- 
ence between the two clippers. One might be tempted to say that the 
abrupt clipper yields slightly higher intelligibility scores than the 
gradual one. Actually, however, only the sets of points 16 and 17 on 
fig. 14(B) and 22 and 10 on fig. 14(@) show a statistically significant 
difference. To determine this, a two-sided Mann-Whitney U test was used. 
This test is one of the most powerful that can be used on data of this 
nature (23). The null hypotheses, H. is that the samples of the two sets 


of scores being investigated came from the same population. A signifi- 
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cance level, X , is chosen and the test determines the probability that 
the null hypthosis is true. If this probability exceeds the significance 
Level e, then the null hypothesis is accepted. A significance level of 
0.01 was chosen for this data. For a complete description of the Mann- 
Whitney U Test, with examples, and a discussion of significance levels, 
see Appendix TIT. 

With the data obtained as described above, and using the Mann-Whit- 
ney U Test with a 0.01 significance level, we see that of the twelve 
sets of data only two caused the null hypothesis to be rejected. Thus 
it can be concluded that there is no significant difference in the two 
sets of data, and the fact that the abrupt clipper appears better is just 
a result of chance. 

To confirm this further tests were run using a different gradual 
clipper composed of two unbiased 1N69A diodes whose clipper characteris- 


tic is shown in fig. 15. 


e, VOLTS 





Figure 15. Clipping characteristic 1N69A zero bias 
The results of these articulation tests are shown compared with the 
abrupt clipper results in fig. 16 (A), (B), (C), and (D). Using the 
Mann-Whitney test again with a significance level of 0.01, again only 


two sets of scores, marked 28 and 16 and 23 and 35 show significant dif- 
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Figure 16. ARTICULATION TEST RESULTS 
©= 1N69A zero bias, gradual. 
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ferences. Note that the differences in these tests are in opposite direc- 
tions. The point labeled spurious in fig. 16(C) is considered too high 
and was not used. We see that the abrupt clipper no longer has higher 
scores. 

Thus it is safe to conclude that there is no significant difference 
between gradual and abrupt clippers as regards speech intelligibility 2 

In the next section.another scheme for speech processing will be 
considered. 

9. Radio Frequency Clipping to Improve Audio Signal Intelligibility. 

As we have seen, it is quite well accepted practice to clip a single 
sideband speech wave in order to improve its peak to average value ratio 
while retaining intelligibility. This has application in systems in which 
it is necessary to transmit the radio frequency wave through a noisy chan- 
nel. In many applications it is desired to transmit speech at audio fre- 
quencies through noisy channels. As mentioned before, examples of peak 
power limited systems in which this is done are ordinary public address 
systems. 

We have also seen that it would be advantageous to perform clipping 
on the audio wave directly to improve intelligibility. But we have noted 
that a great number of harmonic and intermodulation distortion components 
are formed by this clipping process. To reduce this distortion we can 
translate the audio wave to a radio frequency, say as an upper sideband 
signal, clip it then filter it to regain the original upper sideband band- 
width. The distortion components introduced by the clipping are now 
separated by frequencies of the order of magnitude of the carrier, with 
the exception of the lowest order terms. Passing the clipped r-f wave 


through a filter such as an upper sideband mechanical filter with a pass 
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band of the order of magnitude of the audio range, say 3 KHz., will elim- 
inate all but the desiréd audio and these lowest order distortion terms. 
It remains to be seen whether the amount of repeaking involved in the 
filtering and frequency translation of the clipped wave back down to audio 
cancels out the gain in intelligibility due to the reduction in distortion. 
To determine the validity of the above statements, a device which 
will be referred to as anR-F Speech Processer'" was constructed. A block 
diagram of this device is shown in figure 17, and detailed diagrams of 
each component are contained in Appendix IV. The audio input signal is 
translated to a double sideband signal by the balanced modulator, using 
the 455 KHz. L-C oscillator to provide the carrier. The lower sideband 
is removed by the first upper sideband filter. The signal is then an- 
plified and clipped by the r-f amplifier-clipper. This signal is filtered 
by the second upper sideband filter and returned to audio by the product 


detector, again using the 455 KHz. L-C oscillator to insert the carrier. 


PROCESSED 
“ROD. | AUDIO 





Figure 17. R-F Speech Processer 


To compare the intermodulation distortion generated by the speech 
processer, and to measure the repeaking involved in the filtering and 


frequency translation of the clipped wave, two-tone tests were used. 
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The same two tones used in section 8, 1500 and 2500 Hz. were used here. 
Table III shows the results of these tests, in the r-f column, while the 
results of the audio clipping with the gradual clipper from section 8 are 
shown for comparison in the a-f column. In Table III we see quite dis- 
tinctly that the R-F Speech Processer causes considerably less intermod- 
ulation distortion than the audio clipping. The results of the repeaking 
measurements are shown in Table IV. Here we see that no serious repeaking 
occurs in the filtering and tranlation of the clipped r-f wave to audio 
frequencies. (The repeaking of a 20db clipped audio wave filtered from 
300-3000 Hz. is 4.2db (28)). 

In order to determine the effect of this processing on intelligibil- 
ity, it was decided to conduct articulation tests with speech processed 
in this manner. Using the notation introduced previously, 10 tests were 
conducted, with r-f clipping levels of 12 and 24 db and A's of B, 6, 12, 
and 18 db and the maximum obtainable A with each clipping level. Be- 
cause of the small number of tests involved the pre-recorded method of 
testing was not used. Instead, the word lists described in section 6 
were played through the speech processer directly into the listener's 
headsets for each condition described above. Further details on the 
equipment setup used in these tests are given in Appendix II. 

The results of these tests are shown in figure 18. Further details 
on test results may also be found in Appendix II. In figure 18 we have 
taken the average of the three audio clipping test scores obtained in 
section 8 for each condition shown and plotted them on the same axes as 
the r-£ clipping articulation scores. It can be seen that in each case 
the r-£ clipped speech is more intelligible than the speech clipped and 


filtered at audio. 
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Figure 18. ARTICULATION TEST RESULTS 
O = Average scores, audio clipping. 
ZA= R-F clipping. 
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21 


Clipping Level 3.8 db 6.8 db TT] 


Frequency a-f r~f a-f r-f a-f r-f 
500 2176 60.0 17,2 47.1 PZA 40.5 
3500 22.20 JED 17.0 28,5 13.4 2715 
4500 38.0 61.0 28.0 IZ 20.5 505 
5500 2230 62.0 18.0 59.0 See 2.5 
6500 22.0 55.0 179 50.4 1556 47,0 
7500 36.0 57.0 39.5 325 31% 2 52.8 
8500 46.0 - 40.3 - 36. 1 61.1 

9500 38.8 - 28.5 = 20.5 - 


Table III. Intermodulation distortion of two tones of 1500 and 
2500 Hz. by r-f and a-f clipping, in db down from 


fundamentals. 
Clipping Level Repeaking 
db db 
38 0.6 
6.8 1.6 
17.7 Sl 
26.3 6.2 


Table IV. Repeaking associated with filtering and translation 
to audio of r~f wave clipped to levels shown. 
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Using the highest set of the three audio clipping scores and apply- 
ing the U-test, again at a level of 0.01, we find that only two sets of 
points (r-f and a-f) do not show a statistically significant difference. 
These are marked A and A* on figure 18c. Thus we can conclude that pro- 
cessing speech with our r-f speech processer is indeed advantageous. The 
average improvement in articulation over the audio processing is 20.5%. 

In addition to the ap tests were run at the best A available 
through the processer to determine the effect of the r-f processer alone 
on intelligibility. At 12 db of clipping the best A obtainable was 36.5 
db, while at 24 db of clipping the best A was 30 db. The articulation 
scores obtained under these conditions were 93% for the 12 db case and 
90% for the 24 db case. When we consider that even under the best condi- 
tions a few words will be missed by the best listener, we Can realize 
that these scores really indicate that r-f clipping and filtering alone 
have an almost negligible effect on intelligibility. In fact the loss 
of intelligibility that did occur could be attributed to distortion in- 
troduced in the balanced modulator or product detector and might be 
independent of the actual clipping and filtering process. 

10. Conclusions. 

We have seen that as long as the formant regions are not too severly 
distorted or the zero crossings of the time waveform are not radically 
altered, we can do a lot to speech to improve its characteristics vis-a- 
vis our communications systems while not impairing its intelligibility. 
We have noted that this is due to the natural redundancy of speech. 

In our investigation of clipping we have discussed the "noise" intro- 
duced by: the clipping process itself. We have discussed and investigated 


two ideas for the minimization of this noise. One of these, the idea of 
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gradual versus abrupt clipping, we found to be of no practical value, 
except that we now know that if we are given a choice we might as well 
avoid the need for biasing and use a gradual, unbiased diode clipper 
rather than an abrupt, biased one, since they will result in the same 
level of intelligibility. The other idea, of processing the speech at 
r-f, shows merit. We found that an increase of 20% in intelligibility 
could be achieved over ordinary audio clipping by this method. This con- 
firms the ideas about the "noise" introduced by clipping and shows how it 
is reduced substantially by the filtering of the clipped r-f wave. 
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APPENDIX T 
SINGLE AND TWO TONE TESTS 
l. Single tone tests. 
The equipment arrangement used for the single tone tests is shown 


below: 





The Hewlitt Packard 200AB audio oscillator, when set at 200 Hz. provided 


the following output: 


Frequency db 
200 0 
400 -56.0 
600 -67.0 


The clipper chassis for the gradual clipper is shown below: 


/OK 


/OK 
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For the abrupt clipper the below circuit was used. 


/OK 


/\ —/N34A 5 — Vi 






(OK 


The bias was provided by Hewlitt Packard 721A's, which were adjusted to 
give a symmetrical one volt clipping level. 

The H.P. wave analyzer has an accuracy of 1% + 5 eps and * 5% in 
voltage. 
2. Two tone tests. 


The equipment arrangement used here was as shown below: 


CLIPFER ANALY 2ER 





The output of the two tone generator, taken at point A, with no clipper 


attached, across a 10k ohm load was: 


Frequency db Frequency db Frequency db 
1500 0 3500 -` -69.0 6500 -72.0 
2500 0 4500 -71.1 7500 -60.2 
3000 -68.0 5000 -72.0 


The clippers used in these tests were identical with those used in the 


single tone tests. 
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APPENDIX II 
ARTICULATION TEST DETAILS 
l. Word lists. 
Below are shown two examples of the phonetically balanced word lists 


used in the articulation tests. 


Word List #17 Word List #32 

l. flag 26. read l. fast 26. rouge 
2. thank 27. year 2. soak 27. wise 
3. chess ZO. Weit 3. clog 28. pad 
4. club 29. MOoÉ 4. did 29. judge 
5. phone 30. smart 5. roast 30. sigh 
6. odd 31. give 6. retch 31. dia 

7. birth 32. Cud 7. beard 32. eye 
8. carve 33. mass 8. "eltek 33. pew 
9. boost 34. root 9. cart 34. rout (rowt) 
10. grace 35. throne 10. joke 35. souse 
l1. foe 96. “itch 11. gang 36. fátr 
12. weak 37. wipe 12. alt 37. wash 
13. arch 38. clown 13. ace 38. crate 
14. gate 39. sip 14. hump 39. seed 
15. Steen 40. wild 15. mow (mo) 40. walk 
16. crowd 41. spud 16. bare 41. skid 
17. troop 42. ice 17. duke 42. lid 
18. beef 43. key 18. through 43. pack 
19. nerve 44. toad 19. puss 44, theme 
20. with 45. noose 20. web 45. quip 
21. fume 46. rude 21. get 46. salve 
22. Bit 47. pact 22. brass 47. robe 
23. fuse 48. than 23. gob 48. slush 
24. ten 49. fluff 24. slice 49. flash 
25. nuts 50. chest 25. ramp 50. cork 


2. Peak list words. 


Word List Peak list word Relative amplitude 
b= bask 1.0 
2: perk 1O 
3 fern i 
4, start 1.05 
5. thrash 1215 
6. 3 check 4.2 
me rack 1.0 
Se cloak ZI 
9. good 1205 

10. thud EEL 
T1: kept 0.95 
12, kept 1.0 
13% scout 0.90 
14. dope 1.0 
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Word List Peak list word 
15% dumb 
l6. look 
1% ditch 
18. ditch 
19. lap 
20. put 
21% aull 
22% crutch 
23: out 
24. foot 
25; soap 
26: dead 
SS, wreck 
28. tire 
29. shock 
30. thorn 
31. AA 
32, route 


3. Listener's average rank on all tests. 


Section 8 Tests 


Listener A B C D E 


Ave. Rank 5-9 2.4 5.9 225 2.5 


4. Position's average rank on all tests. 


Section 8 Tests 
Position 1 2 3 4 5 


Ave. Rank 3.0 282 3.3 SES 37 


5. The tests were 


in which the tests were given. 


Section 8 


37 39 25 41 
38 10 9 32 
35 24 19 30 
22 27 40 12 
18 4 29 31 
17 7 14 42 
71 6 15 

2 36 3 
16 34 11 
13 5 33 
21 20 28 
12 23 26 
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given in random order. 


Relative Amplitude 


6 


3.6 


The below list shows the order 


® > TS. © e >œ 
O Oo OO OD 4 OD 0 Q ra KM O Q GO G OOOO 


e 


© 


FP orooqoceoocoocorrrde @rrre oO 


Section 9 Tests 


B C D 


3.2 2 


Section 9 


Not done 


Section 9 


Pp 


== 3 
A rä OD Oh zl LN OH wä E LA Kad DN 


E 0.2 


6. The table below shows further details of the articulation testing 
described in section eight. The clipper designation 1N34A/0 means the 
1N34A diode with zero bias. The 1N34A/1 means the 1N34A diode with one 


volt reverse bias. 


Test Listener Scores Clipper CG A Word Lists 
A A B C D E F Ave. Used db db Used 
1 50 55 53 50 67 57 55 1N34Ł4A/O0 12 18 25,26 
2 42 Me 3 43 43 42 Hl " " © 27,28 
3 21 WAG 27 24 26827 26 " " 6 1,9 
4 14 12 12 2 18 14% 15 tt tt 3 7,9 
5 53 69 67 65 69 68 65 t 24 18 12,14 
6 48 47 54 55 57 47 51 rt " 12 3,5 
7 27 30 28 33 28 34 30 " tt 6 SE: 
8 18 18 16 20 20 25 20 " " 3 6,8 
9 82 88 85 88 82 84 85 " 33 “1815,21 
10 66 68 63 75 70 63 68 " " 12 18 IG 
11 58 62 54 61 63 55 59 " " 6 2,6 
12 36 43 21 40 53 37 39 lia 2 tt 3 822706 
13 61 63 59 71 53 67 62 1N34A/1 12 18 31, 32 
14 51 80 2 42 43 33 42 " O 30,31 
15 ISO. 32° 23 17 w 20 rt d 6 1,10 
16 10 5S) 10 18 IE is M rt u 3 29,30 
17 73 81 67 81 83 72 76 rt 24 18 -203 
18 56 62 53 57 54 39 54 " R 12 17,19 
19 49 64 40 44 51 #49 50 " 7 664420522 
20 19 28 25 3 34 21 25 rt G 3 15,18 
21 89 88 85 89 92 91 89 tt 33 18 32) 
22 70 83 78 91 84 84 92 " w 12 "STONIE 
23 55 75 75 69 72 61 68 tt 5 "6 11,19 
24 58 TESIS 40, 56 54 35 51 r" " 3 18,20 
25 67 78 71 62 71 79 71 1N69A/0 12 18 13,17 
26 57 55 45 49 52 49 50 rt ee 12 SG 13 
27 20 28 23 45 23 30 28 Ç J 6 22,24 
28 21 ÆT 229 31 28 22 23 a ll 3 “Tiss 
29 86 82 70 80 79 77 79 " 24 18 29,27 
30 jie seo, 81 81“ 82 77 w RTZ 27,21 
31 47 66 38 57 60 57 45 " tt 6 24,30 
32 26 30 30 37 40 29 32 tt " a 1 
33 83 90 88 94 87 90 89 " 36 18 4,8 
34 59 80 66 75 71 71 70 rt ue > 12 8,10 
35 45 36 37 WM 4g 48 47. " Ç 6 29,32 
36 32 43 33 50 37 40 39 d m 3 4,6 
37 53 60 55 6l 6l 63 59 NONE O 18 30,31 
38 25 26 24 36 29 25 28 n 777] O 
39 16 MB 17 9 19 10 15 " t 6 1,2 
40 40 56 35 52 54 56 49 " " 18 24,8 
41 25. 25 8 27 -28 25 26 d u leie 20) 
42 6 10 4 9 11 6 9 " M 6 6,10 
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Tests 37, 38, and 39 were composed of speech with no processing at all 
and were used as the dummy training tests. Tests 40, 41, and 42 consisted 
of unclipped but filtered speech. 


7. Equipment set up for recording tests of section eight. 


TAPE FILTER TAPE 
ECORDERLCLIPPER | 300- ECORDE, 
#7 OOOKZ | Z 





8. The form shown on the next page was used for all listening tests. 


9. Equipment arrangement for tests of section 9. 
AMPLIFIER -NA 
i LOAD 


JK 





MOUSE | AMPLIFIER 
GEN. NỌ. L 


Amplifier #l was the amplifier section of an ME-6D/U multimeter, with a 
flat response from 15 to 250,000 Hz. Amplifier #2 was a Hewlett-Packard 


450A amplifier with a flat response from 5 Hz. to 1 MHz. 


NAME 


10 


2 


12 


13 


14 


15 


16 


de 


18 


19 


20 


21 


22 


23 


24 


25 


26 


27 


28 


29 


30 


31 


32 


33 


34 


35 


36 


37 


38 


39 


40 


41 


42 


43 


45 


46 


47 


48 


49 


50 


Sl 


TEST # 


POSITION # 


NAME 


51 
52 
53 
54 
55 
56 
57 
58 
59 
60 
61 
62 
63 
64 
65 
66 
67 
68 
69 
70 
71 
he 
73 
74 


a 


76 


27 


78 


m2 


80 


81 


82 


83 


84 


85 


86 


87 


88 


89 


90 


21 


ae 


93 


94 


25 


96 


>. 


1 


22 


98 


99 


00 


TEST # 


wñ, 


The 3 kilohm load consisted of six 300 ohm headsets each connected across 
a 500 ohm L-pad. The L-pads were connected in series, thus enabling each 
listener to adjust volume and still present a constant 3000 ohm load to 
the circuit. 

10. The table below shows further details of the articulation testing 


described in section nine. 


Test Listener Scores C Word Lists 
# A B C D E Ave. db. db. Used 
1* 67 63 70 60 59 64 0 24 152 
2% 42 33 40 36 37 38 0 13 739 
3 37 95 oi 95 89 33 12 36 20520 
4 86 88 83 85 72 85 12 18 24,25 
5 69 61 67 65 53 63 IS (e 30532 
BO 51 54 49 52 47 51 12 6 12,13 
47 46 38 35 42 37 40 12 3 NET 
8 94 91 90 92 85 90 24 30 4,7 
9 87 87 89 95 81 88 24 18 8,9 
10 80 80 78 66 75 76 24 12 del pues 
11 65 55 76 67 58 64 24 6 12,14 
12 48 54 59 51 42 51 24 3 22,24 


*These tests were used as training tests. 
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APPENDIX III 
THE MANN-WHITNEY U TEST (23, 14) 
l. Description. 

The Mann-Whitney U test is used to determine whether two independent 
sets of samples have been drawn from the Same population or not. The null 
hypothesis, Ho» is that the two sets have the same distribution. The 
alternative hypothesis, Hy > is that one set is stochastically larger or 
smaller than the other. We accept H] if the probability that one single 
score from one set is larger or smaller than the other is not 1/2. 

2. Method. 

Call one set of scores X with scores X], San +...) Xp and the other 
set Y with scores yj; Wan eessen Yn First, the two sets of scores are 
combined and the order statistic formed. Then one set, X or Y, is chosen 
to form the parameter U. The value of U is given by the number of times 
that a score in the set, say X, follows a score from Y. A table is con- 
sulted giving for each set of m and n the probability that DR the 
value found, if H, is true. A significance level, &, is chosen. If 
the value found from the U-test table is greater than OX then we say that 
the sets X and Y came from the same population, or that Ha is true. Con- 
is 


versly, if this value is less than the & chosen, then we say that H, 


true or that X and Y are from different populations. If the U = US cal- 


culated is greater than mn, then we use U. = mn - U, as the value of U 
2 

for the table. 

3. Examples. 


Choose X = 0.01. This means that it is desired that the values of 


U should be so small that the probability of their occurrence under H, is 
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less than or equal to 0.01. 


Tests 12, 24 of section eight: 


X = test 12 scores = 36,43,21,40,53,37. 
n=n=6 
Y = test 24 scores = 58,51,49,56,54,35. 
Order statistic: 21, SIMSON 3/7, "40, 43, 49, 51, 53, 54, 56, 58. 


Settieach belongs to: X Y X X X X Y Y X Y Y Y 
To find U, use set X: U =0+ 1+ 1+1l+1+3=7 

Table J on page 271 of Siegel gives P(US7/H,) = 0.092. 

This is greater than ©< so H is rejected. 


Tests 7, 19 of section eight: 


X= test 7 = 27,30,28,33,28,34. 
nen = 6. 
Y = test 19 = 49,64,40,44,51,49. 
Order statistic: 27, 28, 28, 30, 33, 34, 40, 44, 49, 49, 51, 64. 


Set each belongs to: X X X KO X e, Y Y Y 

OO + Y OTERO O + 0 < 0 

From the table P(U=0/H,) = 0.002, which is less Chan ee, So H, is re- 
jected. 

4. Significance level. 

A significance level of 0.01 was chosen since it was felt that one 
could not be too rigorous considering the relatively unsophisticated method 
of testing and the size of the samples. In the study conducted at Montana 
State College (27) a significance level of 0.001 was used. Lindgren sug- 
gests levels of from 0.05 to 0.1, while Siegel uses levels from 0.001 to 
0.14. Siegel, in discussing significance gives 0.01 and 0.05 as common 
values for this type of data. 

5. Efficiency. The efficiency of this test is quoted by both Lindgren and 


Siegel to be 0.96 asymptotically, and Lindgren quotes Hodges and Lehmann 
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as showing that it is always at least 0.864, thus making it one of the 


most powerful of such tests. 
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APPENDIX IV 


DETATLED SCHEMATIC DIAGRAMS OF R-F SPEECH PROCESSER 


1. Detailed schematic of 455 KHz. L-C oscillator: 


y 
a”. 
„ol / 
2 1.8K OUTPUT 
A BAL /1 0D. 
dE 
= 2N376 47K RODE ` 
0.002 
4S 





0.002 4F = 
d a OUTPUT 


TO PROD. DET. 
BE. 


L ZN 


vw 
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2. Detailed schematic of balanced modulator: 


SI 
INPUT /OK 





ECK 
/0 ‘ 
Ñ FILTER 
A 
3 WI IK 33K en. 
K kaw 
_—— 4 ae 
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3. Detailed schematic of 455 KHz. amplifier and clipper: 


MONITOR 
L 
NIE = 
ANIO® 
IN3371'; >M_> CLIPPED 
| GR USB 
USB po OUTPUT 
INPUT 
On = | 
O. lyh = 
INS. 1 K 
O MÍA IK POK 
al r SZA E 


C 
USB 
A A J L "7 
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